95,053 research outputs found

    A Provable Smoothing Approach for High Dimensional Generalized Regression with Applications in Genomics

    Get PDF
    In many applications, linear models fit the data poorly. This article studies an appealing alternative, the generalized regression model. This model only assumes that there exists an unknown monotonically increasing link function connecting the response YY to a single index XTβ∗X^T\beta^* of explanatory variables X∈RdX\in\mathbb{R}^d. The generalized regression model is flexible and covers many widely used statistical models. It fits the data generating mechanisms well in many real problems, which makes it useful in a variety of applications where regression models are regularly employed. In low dimensions, rank-based M-estimators are recommended to deal with the generalized regression model, giving root-nn consistent estimators of β∗\beta^*. Applications of these estimators to high dimensional data, however, are questionable. This article studies, both theoretically and practically, a simple yet powerful smoothing approach to handle the high dimensional generalized regression model. Theoretically, a family of smoothing functions is provided, and the amount of smoothing necessary for efficient inference is carefully calculated. Practically, our study is motivated by an important and challenging scientific problem: decoding gene regulation by predicting transcription factors that bind to cis-regulatory elements. Applying our proposed method to this problem shows substantial improvement over the state-of-the-art alternative in real data.Comment: 53 page

    Detecting outlying subspaces for high-dimensional data: the new task, algorithms and performance

    Get PDF
    [Abstract]: In this paper, we identify a new task for studying the outlying degree (OD) of high-dimensional data, i.e. finding the subspaces (subsets of features) in which the given points are outliers, which are called their outlying subspaces. Since the state-of-the-art outlier detection techniques fail to handle this new problem, we propose a novel detection algorithm, called High-Dimension Outlying subspace Detection (HighDOD), to detect the outlying subspaces of high-dimensional data efficiently. The intuitive idea of HighDOD is that we measure the OD of the point using the sum of distances between this point and its k nearest neighbors. Two heuristic pruning strategies are proposed to realize fast pruning in the subspace search and an efficient dynamic subspace search method with a sample-based learning process has been implemented. Experimental results show that HighDOD is efficient and outperforms other searching alternatives such as the naive top–down, bottom–up and random search methods, and the existing outlier detection methods cannot fulfill this new task effectively

    Learning Convolutional Text Representations for Visual Question Answering

    Full text link
    Visual question answering is a recently proposed artificial intelligence task that requires a deep understanding of both images and texts. In deep learning, images are typically modeled through convolutional neural networks, and texts are typically modeled through recurrent neural networks. While the requirement for modeling images is similar to traditional computer vision tasks, such as object recognition and image classification, visual question answering raises a different need for textual representation as compared to other natural language processing tasks. In this work, we perform a detailed analysis on natural language questions in visual question answering. Based on the analysis, we propose to rely on convolutional neural networks for learning textual representations. By exploring the various properties of convolutional neural networks specialized for text data, such as width and depth, we present our "CNN Inception + Gate" model. We show that our model improves question representations and thus the overall accuracy of visual question answering models. We also show that the text representation requirement in visual question answering is more complicated and comprehensive than that in conventional natural language processing tasks, making it a better task to evaluate textual representation methods. Shallow models like fastText, which can obtain comparable results with deep learning models in tasks like text classification, are not suitable in visual question answering.Comment: Conference paper at SDM 2018. https://github.com/divelab/sva
    • …
    corecore